AITopics | motion feature

Collaborating Authors

motion feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

Neural Information Processing SystemsJun-14-2026, 08:11:59 GMT

Image-to-video generation has made remarkable progress with the advancements in diffusion models, yet generating videos with realistic motion remains highly challenging. This difficulty arises from the complexity of accurately modeling motion, which involves capturing physical constraints, object interactions, and domain-specific dynamics that are not easily generalized across diverse scenarios. To address this, we propose MotionRAG, a retrieval-augmented framework that enhances motion realism by adapting motion priors from relevant reference videos through Context-Aware Motion Adaptation (CAMA). The key technical innovations include: (i) a retrieval-based pipeline extracting high-level motion features using video encoder and specialized resamplers to distill semantic motion representations; (ii) an in-context learning approach for motion adaptation implemented through a causal transformer architecture; (iii) an attention-based motion injection adapter that seamlessly integrates transferred motion features into pretrained video diffusion models. Extensive experiments demonstrate that our method achieves significant improvements across multiple domains and various base models, all with negligible computational overhead during inference. Furthermore, our modular design enables zero-shot generalization to new domains by simply updating the retrieval database without retraining any components. This research enhances the core capability of video generation systems by enabling the effective retrieval and transfer of motion priors, facilitating the synthesis of realistic motion dynamics.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

Temporal Coherency based Criteria for Predicting Video Frames using Deep Multi-stage Generative Adversarial Networks

Prateep Bhattacharjee, Sukhendu Das

Neural Information Processing SystemsNov-21-2025, 15:57:49 GMT

The proposed method uses two stages of GANs to generate crisp and clear set of future frames.

artificial intelligence, machine learning, objective function, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Movement-Specific Analysis for FIM Score Classification Using Spatio-Temporal Deep Learning

Masaki, Jun, Higashi, Ariaki, Shinagawa, Naoko, Hirata, Kazuhiko, Kurita, Yuichi, Furui, Akira

arXiv.org Artificial IntelligenceNov-17-2025

The functional independence measure (FIM) is widely used to evaluate patients' physical independence in activities of daily living. However, traditional FIM assessment imposes a significant burden on both patients and healthcare professionals. To address this challenge, we propose an automated FIM score estimation method that utilizes simple exercises different from the designated FIM assessment actions. Our approach employs a deep neural network architecture integrating a spatial-temporal graph convolutional network (ST-GCN), bidirectional long short-term memory (BiLSTM), and an attention mechanism to estimate FIM motor item scores. The model effectively captures long-term temporal dependencies and identifies key body-joint contributions through learned attention weights. We evaluated our method in a study of 277 rehabilitation patients, focusing on FIM transfer and locomotion items. Our approach successfully distinguishes between completely independent patients and those requiring assistance, achieving balanced accuracies of 70.09-78.79 % across different FIM items. Additionally, our analysis reveals specific movement patterns that serve as reliable predictors for particular FIM evaluation items.

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.10713

Country: Asia > Japan (0.16)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine > Health Care Providers & Services (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Occlusion-Aware Diffusion Model for Pedestrian Intention Prediction

Liu, Yu, Liu, Zhijie, Yang, Zedong, Li, You-Fu, Kong, He

arXiv.org Artificial IntelligenceNov-4-2025

Abstract--Predicting pedestrian crossing intentions is crucial for the navigation of mobile robots and intelligent vehicles. Although recent deep learning-based models have shown significant success in forecasting intentions, few consider incomplete observation under occlusion scenarios. T o tackle this challenge, we propose an Occlusion-A ware Diffusion Model (ODM) that reconstructs occluded motion patterns and leverages them to guide future intention prediction. During the denoising stage, we introduce an occlusion-aware diffusion transformer architecture to estimate noise features associated with occluded patterns, thereby enhancing the model's ability to capture contextual relationships in occluded semantic scenarios. Furthermore, an occlusion mask-guided reverse process is introduced to effectively utilize observation information, reducing the accumulation of prediction errors and enhancing the accuracy of reconstructed motion features. The performance of the proposed method under various occlusion scenarios is comprehensively evaluated and compared with existing methods on popular benchmarks, namely PIE and JAAD. Extensive experimental results demonstrate that the proposed method achieves more robust performance than existing methods in the literature. ITH the rapid advancement of intelligent sensing and computing technologies, much progress has been made in recent years in developing autonomous vehicles to enhance traffic efficiency and road safety. To prevent collisions, path planning of autonomous vehicles [1], [2] is essential, requiring an understanding of interactions between road users and the ability to forecast their potential actions [3]-[5]. This manuscript has been accepted to the IEEE Transactions on Intelligent Transportation Systems as a regular paper. Y u Liu is also with the Department of Mechanical Engineering, City University of Hong Kong, Hong Kong SAR, China. Y ou-Fu Li is with the Department of Mechanical Engineering, City University of Hong Kong, Hong Kong SAR, China. The typical scenario of visual occlusion is illustrated here. Solid green lines represent the parts of the observation that are within the field of view and visible, while dashed red lines indicate positional features that are undetectable due to occlusion.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2511.00858

Country: Asia > China > Hong Kong (0.85)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LILAC: Long-sequence Incremental Low-latency Arbitrary Motion Stylization via Streaming VAE-Diffusion with Causal Decoding

Ren, Peng, Yang, Hai

arXiv.org Artificial IntelligenceOct-20-2025

Generating long and stylized human motions in real time is critical for applications that demand continuous and responsive character control. Despite its importance, existing streaming approaches often operate directly in the raw motion space, leading to substantial computational overhead and making it difficult to maintain temporal stability. In contrast, latent-space VAE-Diffusion-based frameworks alleviate these issues and achieve high-quality stylization, but they are generally confined to offline processing. To bridge this gap, LILAC (Long-sequence Incremental Low-latency Arbitrary Motion Stylization via Streaming VAE-Diffusion with Causal Decoding) builds upon a recent high-performing of-fline framework for arbitrary motion stylization and extends it to an online setting through a latent-space streaming architecture with a sliding-window causal design and the injection of decoded motion features to ensure smooth motion transitions. This architecture enables long-sequence real-time arbitrary stylization without relying on future frames or modifying the diffusion model architecture, achieving a favorable balance between stylization quality and responsiveness as demonstrated by experiments on benchmark datasets. Supplementary video and examples are available at the project page: https://pren1.github.io/lilac/.

artificial intelligence, machine learning, sequence, (13 more...)

arXiv.org Artificial Intelligence

2510.15392

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

FTIN: Frequency-Time Integration Network for Inertial Odometry

Zhang, Shanshan, Zhang, Qi, Wang, Siyue, Wu, Liqin, Wen, Tianshui, Zhou, Ziheng, Peng, Ao, Hong, Xuemin, Zheng, Lingxiang, Yang, Yu

arXiv.org Artificial IntelligenceOct-17-2025

However, high IMU sampling rates introduce substantial redundancy that impedes IO's ability to attend to salient components, thereby creating an information bottleneck. To address this challenge, we propose a cross-domain IO framework that fuses information from the frequency and time domains. Specifically, we exploit the global context and energy-compaction properties of frequency-domain representations to capture holistic motion patterns and alleviate the bottleneck. To the best of our knowledge, this is among the first attempts to incorporate frequency-domain feature processing into IO. Experimental results on multiple public datasets demonstrate the effectiveness of the proposed frequency-time-domain fusion strategy. Index T erms-- Frequency-Domain Learning, Inertial Odometry, Inertial Measurement Unit signals 1. INTRODUCTION Inertial odometry (IO) aims to reconstruct motion trajectories from high-frequency inertial measurement unit (IMU) signals--comprising tri-axial accelerometer and gyroscope data--in order to enable low-cost and robust localization [1, 2].

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.1612

Country: Asia > China > Fujian Province (0.15)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-3-2025, 02:11:17 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The paper presents an extension of gated auto encoders to time-series data. The main idea is to use a gated auto encoder to model the time series in an autoregressive manner; predicting x_{t+1} from x_t using a gated autoencoder whose mapping unit values are initialised using a pair of contiguous datapoints. The paper introduces two interesting refinements: predictive training, and higher order relational features. Predictive training is a training criterion suitable for time series data that is different from the criterion normally used for gated auto encoders. Predictive training tries to minimise the square error in predicting x_{t+1} given x_{t} and the value of the mapping units that optimally predict x_{t} given x_{t-1}.

gated auto encoder, sequence, transformation, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization

Kim, Hyung Kyu, Lee, Sangmin, Kim, Hak Gu

arXiv.org Artificial IntelligenceAug-26-2025

Speech-driven 3D facial animation aims to synthesize realistic facial motion sequences from given audio, matching the speaker's speaking style. However, previous works often require priors such as class labels of a speaker or additional 3D facial meshes at inference, which makes them fail to reflect the speaking style and limits their practical use. To address these issues, we propose MemoryTalker which enables realistic and accurate 3D facial motion synthesis by reflecting speaking style only with audio input to maximize usability in applications. Our framework consists of two training stages: 1-stage is storing and retrieving general motion (i.e., Memorizing), and 2-stage is to perform the personalized facial motion synthesis (i.e., Animating) with the motion memory stylized by the audio-driven speaking style feature. In this second stage, our model learns about which facial motion types should be emphasized for a particular piece of audio. As a result, our MemoryTalker can generate a reliable personalized facial animation without additional prior information. With quantitative and qualitative evaluations, as well as user study, we show the effectiveness of our model and its performance enhancement for personalized facial animation over state-of-the-art methods.

artificial intelligence, facial animation, facial motion, (15 more...)

arXiv.org Artificial Intelligence

2507.20562

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.98)

Add feedback

EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation

Izzati, Fathinah, Li, Xinyue, Xia, Gus

arXiv.org Artificial IntelligenceJul-8-2025

We propose Expotion (Facial Expression and Motion Control for Multimodal Music Generation), a generative model leveraging multimodal visual controls - specifically, human facial expressions and upper-body motion - as well as text prompts to produce expressive and temporally accurate music. We adopt parameter-efficient fine-tuning (PEFT) on the pretrained text-to-music generation model, enabling fine-grained adaptation to the multimodal controls using a small dataset. To ensure precise synchronization between video and music, we introduce a temporal smoothing strategy to align multiple modalities. Experiments demonstrate that integrating visual features alongside textual descriptions enhances the overall quality of generated music in terms of musicality, creativity, beat-tempo consistency, temporal alignment with the video, and text adherence, surpassing both proposed baselines and existing state-of-the-art video-to-music generation models. Additionally, we introduce a novel dataset consisting of 7 hours of synchronized video recordings capturing expressive facial and upper-body gestures aligned with corresponding music, providing significant potential for future research in multimodal and interactive music generation.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.04955

Country:

Asia > South Korea > Daejeon > Daejeon (0.04)
Asia > Middle East > UAE (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Suite-IN++: A FlexiWear BodyNet Integrating Global and Local Motion Features from Apple Suite for Robust Inertial Navigation

Sun, Lan, Xia, Songpengcheng, Yang, Jiarui, Pei, Ling

arXiv.org Artificial IntelligenceApr-1-2025

The proliferation of wearable technology has established multi-device ecosystems comprising smartphones, smartwatches, and headphones as critical enablers for ubiquitous pedestrian localization. However, traditional pedestrian dead reckoning (PDR) struggles with diverse motion modes, while data-driven methods, despite improving accuracy, often lack robustness due to their reliance on a single-device setup. Therefore, a promising solution is to fully leverage existing wearable devices to form a flexiwear bodynet for robust and accurate pedestrian localization. This paper presents Suite-IN++, a deep learning framework for flexiwear bodynet-based pedestrian localization. Suite-IN++ integrates motion data from wearable devices on different body parts, using contrastive learning to separate global and local motion features. It fuses global features based on the data reliability of each device to capture overall motion trends and employs an attention mechanism to uncover cross-device correlations in local features, extracting motion details helpful for accurate localization. To evaluate our method, we construct a real-life flexiwear bodynet dataset, incorporating Apple Suite (iPhone, Apple Watch, and AirPods) across diverse walking modes and device configurations. Experimental results demonstrate that Suite-IN++ achieves superior localization accuracy and robustness, significantly outperforming state-of-the-art models in real-life pedestrian tracking scenarios.

artificial intelligence, machine learning, motion feature, (15 more...)

arXiv.org Artificial Intelligence

2504.00438

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
(2 more...)

Genre: Research Report > Promising Solution (1.00)

Industry: Information Technology > Hardware (0.56)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback